{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# BeautifulSoup使用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 一、安装导入"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[官方文档传送门](https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```bash\n",
    "$ pip3 install bs4\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bs4 import BeautifulSoup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 二、BeautifulSoup对象"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象,所有对象可以归纳为4种: \n",
    "\n",
    "* Tag对象，与html原生tag相同。tag有名称（Name），属性(Attributes)。\n",
    "* NavigableString对象，可以遍历的字符串。\n",
    "* BeautifulSoup对象，表示一个文档的全部内容\n",
    "* Comment对象，是一个特殊类型的NavigableString对象，注释及特殊字符串。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 通过文件创建BeautifulSoup对象，默认使用lxml作为解析器，推荐\n",
    "soup=BeautifulSoup(open('file/index.html'),'lxml')\n",
    "soup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "html_doc = \"\"\"\n",
    "<!DOCTYPE html>\n",
    "<html><head><title>test</title></head>\n",
    "<body>\n",
    "<p class=\"story\">Once upon a time there were three little sisters; and their names were\n",
    "<a href=\"http://example.com/elsie\" class=\"sister\" id=\"link1\">Elsie</a>,\n",
    "<a href=\"http://example.com/lacie\" class=\"sister\" id=\"link2\">Lacie</a> and\n",
    "<a href=\"http://example.com/tillie\" class=\"sister\" id=\"link3\">Tillie</a>;\n",
    "and they lived at the bottom of a well.</p>\n",
    "<b class=\"boldest\" id=\"1\">Extremely bold</b>\n",
    "<h1><!--Hey, buddy. Want to buy a used parser?--></h1>\n",
    "</body>\n",
    "</html>\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过字符串创建BeautifulSoup对象\n",
    "soup = BeautifulSoup(html_doc)\n",
    "soup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查看BeautifulSoup对象的name\n",
    "soup.name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 通过BeautifulSoup对象获取Tag对象、NavigableString对象、Comment对象"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取Tag对象\n",
    "soup.title"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可以循环调用，当循环调用时，其实可以把BeautifulSoup对象当做Tag对象\n",
    "soup.head.title"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查看对象类别\n",
    "type(soup.title)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取NavigableString对象\n",
    "soup.title.string"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 查看对象类别\n",
    "type(soup.title.string)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取Comment对象\n",
    "comment = soup.h1.string\n",
    "comment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查看对象类别\n",
    "type(comment)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 三、Tag对象操作"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tag=soup.b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 获取Tag对象的某个属性\n",
    "tag['class']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 和上面等价\n",
    "tag.get('class')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取Tag对象的所有属性\n",
    "tag.attrs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 删除Tag对象的某个属性\n",
    "del tag['id']\n",
    "tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 修改Tag对象的某个属性\n",
    "tag['id']=2\n",
    "tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查找匹配某个属性的标签\n",
    "soup.find(id='link1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# class是python关键字，因此以class_代替\n",
    "soup.findAll(class_='sister')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "for link in soup.findAll('a'):\n",
    "    print(link.get('href'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 四、遍历文档树"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1、子节点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 一个Tag可能包含多个字符串或其它的Tag,这些都是这个Tag的子节点\n",
    "body_tag = soup.body\n",
    "body_tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Tag的contents属性可以将tag的子节点以列表的方式输出\n",
    "body_tag.contents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 查看列表长度\n",
    "len(body_tag.contents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Tag的children生成器,可以对tag的子节点进行循环\n",
    "body_tag.children"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for child in body_tag.children:\n",
    "    print(child)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# .contents 和 .children 属性仅包含tag的直接子节点，descendants会递归所有子孙节点\n",
    "for child in body_tag.descendants:\n",
    "    print(child)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "title_tag=soup.head\n",
    "title_tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 如果tag只有一个 NavigableString 类型子节点,那么这个tag可以使用 .string 得到子节点\n",
    "title_tag.string"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 如有多个，可以使用strings得到所有NavigableString子节点\n",
    "for str in body_tag.strings:\n",
    "    print(str)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2、父节点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "title_tag = soup.title\n",
    "title_tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 获取tag的父节点\n",
    "title_tag.parent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 父节点可能是Tag对象(上一行)，可能是BeautifulSoup对象（本行），也有可能是None（下一行）\n",
    "html_tag = soup.html\n",
    "type(html_tag.parent)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(soup.parent)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过元素的 .parents 属性可以递归得到元素的所有父辈节点\n",
    "for parent in title_tag.parents:\n",
    "    if parent is None:\n",
    "        print(parent)\n",
    "    else:\n",
    "        print(parent.name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3、兄弟节点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b_tag=soup.b\n",
    "b_tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 有换行符的缘故\n",
    "b_tag.next_sibling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 下一个的下一个兄弟节点\n",
    "b_tag.next_sibling.next_sibling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 上一个的上一个兄弟节点\n",
    "b_tag.previous_sibling.previous_sibling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过 .next_siblings 和 .previous_siblings 属性可以对当前节点的兄弟节点迭代输出\n",
    "for sibling in b_tag.next_siblings:\n",
    "    print(repr(sibling))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4、回退和前进"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "title_tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 没有上个兄弟节点，因此为空\n",
    "title_tag.previousSibling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 上一个解析的对象（本行是Tag对象，下一行是字符串）\n",
    "title_tag.previous_element"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 下一个解析的对象\n",
    "title_tag.next_element"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for element in title_tag.previous_elements:\n",
    "    print(repr(element))\n",
    "    print('')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 五、搜索文档树"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "### 1、过滤器\n",
    "find_all需要传入过滤器，过滤器可以被用在tag的name中,节点的属性中,字符串中或他们的混合中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 1、最简单的过滤器是字符串\n",
    "# 查找文档中所有的<a>标签\n",
    "soup.find_all('a')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 2、过滤器是正则表达式\n",
    "# 查找文档中所有的b开头的标签\n",
    "import re\n",
    "for tag in soup.find_all(re.compile(\"^b\")):\n",
    "    print(tag.name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查找文档中所有的包含t的标签\n",
    "for tag in soup.find_all(re.compile(\"t\")):\n",
    "    print(tag.name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3、过滤器是列表\n",
    "# 查找文档中所有的<a>和<b>标签\n",
    "soup.find_all([\"a\", \"b\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 4、过滤器可以是True，可以匹配任何值,查找到所有的tag,但是不会返回字符串节点\n",
    "for tag in soup.find_all(True):\n",
    "    print(tag.name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 5、过滤器可以是函数，查看包含 class 和 id 属性的Tag\n",
    "def has_class_and_id(tag):\n",
    "    return tag.has_attr('class') and tag.has_attr('id')\n",
    "\n",
    "soup.find_all(has_class_and_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2、find_all函数格式：\n",
    "find_all( name , attrs , recursive , text , **kwargs )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1、name参数\n",
    "soup.find_all('b')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 2、kwargs参数\n",
    "soup.find_all(id='link2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 也可以使用正则\n",
    "soup.find_all(href=re.compile(\"elsie\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 可以混用，name参数和是多个kwargs参数的组合\n",
    "soup.find_all('a',id='link1',href=re.compile(\"elsie\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 查找有id属性的Tag\n",
    "soup.find_all(id=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 按CSS查询\n",
    "soup.find_all(\"a\", class_=\"sister\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "soup.find_all(class_=re.compile(\"ist\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def has_six_characters(css_class):\n",
    "    return css_class is not None and len(css_class) == 6\n",
    "\n",
    "soup.find_all(class_=has_six_characters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3、text参数，可以搜索文档中的字符串内容\n",
    "soup.find_all(text=[\"Tillie\", \"Elsie\", \"Lacie\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 4、limit参数，限制返回的个数\n",
    "soup.find_all(\"a\", limit=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 5、recursive参数，默认为非直接子节点。可指定为False查找直接子节点\n",
    "soup.find_all(\"a\", recursive=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# p节点下有a直接子节点\n",
    "soup.p.find_all(\"a\", recursive=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "soup.find_all(\"a\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 等价于上一行\n",
    "soup(\"a\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3、find函数格式\n",
    "find( name , attrs , recursive , text , **kwargs )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 和find_all的limit=1等同\n",
    "# soup.find_all('title', limit=1)[0]\n",
    "soup.find('title')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4、find_next_siblings() 和 find_next_sibling()格式\n",
    "find_next_siblings( name , attrs , recursive , text , **kwargs )\n",
    "\n",
    "find_next_sibling( name , attrs , recursive , text , **kwargs )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### find_previous_siblings() 和 find_previous_sibling()¶格式\n",
    "find_previous_siblings( name , attrs , recursive , text , **kwargs )\n",
    "\n",
    "find_previous_sibling( name , attrs , recursive , text , **kwargs )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这4个方法通过 .previous_siblings 和.next_siblings属性对tag的前后解析"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_link = soup.a\n",
    "first_link"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_link.find_next_siblings(\"a\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_link.find_next_sibling(\"a\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5、find_all_next() 和 find_next()格式\n",
    "find_all_next( name , attrs , recursive , text , **kwargs )\n",
    "\n",
    "find_next( name , attrs , recursive , text , **kwargs )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### find_all_previous() 和 find_previous()格式\n",
    "find_all_previous( name , attrs , recursive , text , **kwargs )\n",
    "\n",
    "find_previous( name , attrs , recursive , text , **kwargs )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这4个方法通过 .previous_elements 和.next_elements 属性对tag进行前后解析"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_link = soup.a\n",
    "first_link"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "first_link.find_all_previous(\"title\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6、CSS选择器\n",
    "Beautiful Soup支持大部分的CSS选择器,在 Tag 或 BeautifulSoup 对象的 .select() 方法中传入字符串参数,即可使用CSS选择器的语法找到tag"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过title直接查找\n",
    "soup.select(\"title\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过tag标签逐层查找\n",
    "soup.select(\"body a\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 找到某个tag标签下的直接子标签\n",
    "soup.select(\"head > title\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 找到兄弟节点标签\n",
    "soup.select(\"#link1 ~ .sister\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "soup.select(\"#link1 + .sister\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过CSS的类名查找\n",
    "soup.select(\".sister\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过tag的id查找\n",
    "soup.select(\"a#link2\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过属性的值来查找\n",
    "soup.select('a[href^=\"http://example.com/\"]')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 六、输出"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 格式化输出\n",
    "# prettify() 方法将Beautiful Soup的文档树格式化后以Unicode编码输出,每个XML/HTML标签都独占一行\n",
    "print(soup.prettify())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 压缩输出\n",
    "str(soup)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
